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Abstract — Strong typicality and the Markov lemma have been 
used in the proofs of several multiterminal source coding theo- 
rems. Since these two tools can be applied to finite alphabets only, 
the results proved by them are subject to the same limitation. 
Recently, a new notion of typicality, namely unified typicality, has 
been defined. It can be applied to both finite or countably infinite 
alphabets, and it retains the asymptotic equipartition property 
and the structural properties of strong typicality. In this paper, 
unified typicality is used to derive a version of the Markov lemma 
which works on both finite or countably infinite alphabets so 
that many results in multiterminal source coding can readily be 
extended. Furthermore, a simple way to verify whether some 
sequences are jointly typical is shown. 

I. Introduction 

The Markov lemma was first used by Berger [1] to extend 
multiterminal source coding theory. It has been used in the 
achievability part of the coding theorems in source coding with 
side information [2, Section 15.8], rate distortion with side 
information [2, Section 15.9], channel coding with side infor- 
mation [3, Section 6.2], a large class of multiterminal noiseless 
source coding problems [4], etc. The different versions of the 
Markov lemma given in [l]-[4] have the same limitation that 
all of them cannot be applied to countably infinite alphabets 
because they are based on strong typicality [1][5]. Note that 
the Markov lemma for Gaussian sources has been shown in 
[6]. 

Recently, Ho and Yeung have defined a new notion of 
typical sequences, called unified typicality, which works for 
countable alphabet^ [7]. Unified typicality retains the asymp- 
totic equipartition property and the structural properties of 
strong typicality [8]. We will further show in this paper that 
unified typicality can give a version of the Markov lemma 
for countable alphabets, which can be used to extend the 
achievability parts of the aforementioned coding problems. 
Also, the new Markov lemma further supports that unified 
typicality is a right notion for generalizing strong typicality to 
countable alphabets. 

In order to show that some sequences are jointly weakly 
typical, we need to show 2^ — 1 nonnegative quantities in [2, 
(15.24)] sufficiently small for a problem with k random vari- 
ables. It seems that unified typicality suffers the same trouble. 
In this paper, we will demonstrate a simple method which 
requires to show only two nonnegative quantities sufficiently 
small in order to show jointly unified typical. 

'Countable alphabet means an alphabet which can be finite or countably 
infinite 



In the next section, we introduce unified typicality and some 
notations. In Section IIII-AI the Markov lemma which works 
on both finite or countably infinite alphabet is shown, and its 
consequences are discussed. Then some useful lemmas and the 
trick to ease the verification of jointly unified typical sequences 
are shown in Section IIII-BI before the new Markov lemma is 
proved in Section ITlI-CI In this paper, the base of the logarithm 
is 2. 

II. Unified Typicality 

Consider some countable alphabets X, y and Z. For any 
sequences y = (yi,...,y„) G y^, we say that a sequence 
of random variables X — (Xi, X2, X„) e A"" is drawn 
^ T\iP{^i\yi) if ^re independent and 

n 

Pr{X-x} = nP(a;»l2/»), (D 

i=l 

where x = (xi, . . . , a;„) G A"". Let z = (zi, . . . , z„) G Z". 
We call QxYZ ~ {<l{xyz)} the empirical distribution of the 
sequences (X,y,z), where q{xyz) = n~'^N{x,y, z;X,y,z) 
and N{x, y, z; X, y, z) is the number of occurrences of (x, y, z) 
in the sequences (X,y, z). Note that Qxyz is also called 
the type of (X, y, z) [9] and Qxyz is a random variable as 
X is random. The marginal distribution {q{xy)} is denoted 
by QxY and the other marginal distributions of Qxyz and 
PxYZ = {pixyz)} are defined in a similar fashion. We use 
X — Y — Z to denote a Markov chain with respect to Pxyz, 
i.e., p{xyz) ~ p{x\y)p{yz) for all x, y and z. Now, we use 
the Kullback-Leibler divergence ) and entropy H{-) (see 
e.g., [2] [5]) to define unified typicality [7]. We always assume 
H{Pxyz) < 00. 

Definition 1: The unified jointly typical set U^^yz]-/ "^^^^ 
respect to Pxyz is the set of sequences (x, y, z) G X" x y" x 
Z" such that 

D{Q 

XYZ 

\\Pxyz) + \H{Q' 

XYZ )-H{PxYz)\ + 

\H{Q'^y) - H{Pxy)\ + \H{Q'yz) - H{Pyz)\ + 
mQ'xz) - H{Pxz)\ + \H{Q'x) - H{Px)\ + 
\H{Q'y) - H{Py)\ + \H{Q'z) - H{Pz)\ < 7, (2) 

where Q'xyz = W{xyz)} is the empirical distribution of 
(x, y, z) with q'{xyz) = n-^N{x, y, z; x, y, z). 

^IxYZ]-, with 



The definition of U^-y^^^ is similar to 
D{QxYz\\PxYz) replaced by D{Qyz\\Pyz) and all the 
absolute values involving X being dropped. 



III. Main Results 
A. The Markov Lemma 

The Markov lemma for countable alphabets is given in The- 
orem [T] and its proof will be deferred to Section IIII-CI In this 
paper, we consider only those Pxyz satisfying H{Pxyz) < 
00 and 



^p{x\y) {\ogp{x\y)f < C 



(3) 



for y G y, where C is finite. These assumptions enable us to 
simplify the proofs by using Chebyshev's inequality. 

Theorem 1: Consider Pxyz with H{Pxyz) < oo. As- 
sume that Q is satisfied and X — Y — Z. If for any 7 > and 
any given (y,z) e U^z],f ^ drawn - l\.p{xi\yi), then 



Pr{(X,y,z)e{7['^y^]^} >l-7 
for n sufficiently large and 77 sufficiently small. 



(4) 



Remarks: 

i) This is a generalization of [2, Lemma 15.8.1]. Since 
unified typicality retains the asymptotic equipartition 
property and the structural properties of strong typicality 
[7] [8], it is readily to generalize the achievability parts of 
Theorem 15.8.1 and Theorem 15.9.1 in [2] with X and 
Y taking values from countable alphabets. 

ii) A result similar to [3, (1.27)] with strong typicality 
replaced by unified typicality can be easily shown from 
Theorem [T| 

iii) Theorem [T] can easily generalize the version of the 
Markov lemma in [1] to countably infinite alphabet as 
follows. 

Corollary 2: Consider Pxyz with H{Pxyz) < 00. As- 
sume that ^ is satisfied and X — Y — Z. If for any 7 > 



B. Some Lemmas 

In order to prove Theorem [T] we have to first establish the 
results in this subsection. Let Ei be an events for all i. In this 
paper, we will frequently use the following lemma and the fact 
that if El impHes E2, then Pt{Ei} < Fi{E2}. 

Lemma 1: If Pr{E^} > I - 5„ then 

PT{n,E,}>l-Y,s^■ (10) 

i 

Proof: By the union bound, 

PT{n,E,} = 1 - PT{U,Ef} >i-J2 P^'i^*'} > 1 - 51 ^- 



In the following lemma, we consider the variational distance 
(see e.g., [5]) between Qxyz and Pxyz which is defined as 

V{QxYZ,PxYz) = ^ \q{xyz) -p{xyz)\. (11) 



Lemma 2: Assume X — Y — Z. If for any e > and any 
given (y,z) e U^z]ri' ^ drawn - Yl,p{xi\yi), then 

PT{V{QxYZ,PxYz)<e}>l~e (12) 

for n sufficiently large and 77 sufficiently small. 

Proof: The proof is similar to the proof of [1, 
Lemma 4.1] except that Pxyz is defined on countable alpha- 
bets here. Fix any {x,y, z) G X xyx Z. For 1 < i < n, let Bi 
be binary and independently distributed. If (y, 2:) = {yi,Zi), 
let 



B, = 



with probability 1 — p{x\y) 

1 with probability p{x\y). 



(13) 



If {y,z) ^ iyi,Zi), let Bi = 0. Then N{x,y, z;X,y,z) and 
have the same distribution on the set of integers. So 



and any given z e UlL, , (X,Y) is generated according to x-^ 

PrUX Y) = fx, y)| M n(xJ. then E[A.(., y, z; X, y, z)] = ^ E[S.] = pix\y)Niy, z; y, z). (14) 



Pr{(X,Y) = (x,y)} =f[^p{x,y,), then 

Pr{(X,z) e U(^z]^\{Y,z) e U(i^z]n} > 1 - 7, (5) 

for n sufficiently large and rj sufficiently small. 

Proof: If (X,Y,z) G U^^y^^^, then (X,z) e CA"^^,^ 
from the consistency theorem m [7, Theorem 5]. Therefore, 

Pr{(X,z) e ;7[l^],|(Y,z) e U(i^z]J (6) 

> Pr{(X,Y,z)e;7["^^^]^|(Y,z)G{/['^^]J (7) 

J2 Pr{Y = y|(y,z)eC/['^^]^}- 

y:(y,z)eC/["y2]^ 

Pr{(X,y,z) G C/[W]^l(y,z) e f/f^^],} (8) 

> 1-7, (9) 

where (|9]l follows from Theorem [T] ■ 



i=l 



Since Bi are binary and independent, the variance of 

N{x,y,z;X,y,z) is 



Var[iV(x,2/,z;X,y,z)] =^Vai-[B,;] < 



(15) 



For any 6 > 0, Chebyshev's inequality [2, (3.32)] can be 
applied to show 



Pr{|A^(a;,y,z;X,y,z) - p{x\y)N{y, z;y,z)\ > nS} 
Wai[N{x,y,z;X,y,z)] ^ 1 



< 



{nsy 



(16) 



where the last inequality holds for sufficiently large n. 
Since q{xyz) = n^^N{x,y, z;X,y,z) and q{yz) = 
n~^N(y, z;y,z), (fTSI l is equivalent to 



Pr{\q{xyz) — p{x\y)q{yz)\ < S} > 1 — S. 



(17) 



Now for any e > 0, let 



^ 32' 



(18) 



Since (y,z) e U^yzw D{Qyz\\Pyz) <V = m- By Pinsker's 
inequality [2] and the fact that In 2 < 1, 

I > (19) 

= ^p{x\y)\q{yz) - p{yz)\ (20) 

xyz 

= '^\pix\y)q{yz) -p{xyz)\, (21) 



xyz 



where (|2T]i follows from that X -Y - Z . Let M ^ \S\ where 
ScXxyxZisa finite subset such that 

e 



(22) 



^ p{xyz) > 1 

{x,y,z)eS 

Here, the left side of (l22l l goes to 1 as — oo, so that such 
S must exist. Let E^y^ = l{\q{xyz) - p{x\y)q{yz)\ < ^} 
and suppose E^yz = 1 for all {x, y, z) e S. Then 

X! ^(a^y^) < |- (23) 

(x,y,z)^S 

Together with (1211 1. we have 

3e 



(x,j/,z)es 



(24) 



and hence. 



3e 



^ g(xyz) > ^ p(a;2/z) - > 1 - (25) 

{x,y,z)£S {x,y,z)£S 



xyz 



{x,y,z)SS \ {x,y,z)£S 



< 



1 - X! 

(2;,i/,2)e5 

3e e e 
¥+2 + 8 



(27) 

(28) 
(29) 



where ^ follows from (|22|, ^ and (|25j. Therefore, if 

Exyz = 1 for all {x,y,z) e S, then ^^(Qxyz, ^xrz) < £• 
So we can put 6 ~ into STU and apply Lemma [T] to show 
that when n is sufficiently large, 

Pr{V{QxYZ,PxYz) <e} > Pr{n(,,y,,)e5^.y J 

> l-^>l-e. (30) 



We now establish a result regarding the Kullback-Leibler di- 
vergence and entropy difference between Px\yz ^nd Qx\yz- 
In the following lemma, {y^\z'^) is not necessarily jointly 
typical. Also, Qx\Y=y.z=z Px\Y=y,z=z the proba- 
bility distributions of X when Y = y and Z — z are given. 
Recall that we consider only those Pxyz satisfying (|3]l and 
H{PxYz) < 00. 

Lemma 3: Assume X — Y — Z. If for any e > and any 
given {y^\z"), X is drawn ^ YliPi^ilVi)^ ^^^^ 

Ft I ^(7(2/z) {D{Qx\Y=y.Z=z\\Px\Y=y.Z=z) + 
K V2 



H{Qx\Y=y,Z=z) - H{Px\Y=y,Z=z)) 



<e)' >l-e 
(31) 



for n sufficiently large. 

Proof: For 1 < i < n, let A; = \ogp{Xi\yi). Since Xi 
are independent, Ai are also independent. Together with (|3]l, 
the upper bound on the variance of X]"=i is given by 



Var 



= ^ Var[A,] < J2 < ^iC^, (32) 



By Chebyshev's inequality. 



E^^ 



4=1 



when n is sufficiently large. Then 



Pr. 



E^^ 



< -^<e (33) 



<e}>l-e, (34) 



where the last inequality follows from (l22l i. Thus, 

E 1^(^2/2) I (26) 7i~iE 



where the left sides of dSTI ) and ( l34b are equal because 



E^^ 



(35) 



-n-^E^^ 

n 

= n^^^^Pix\y^)logp{x\y,) ~ n^^^logp{X,\y,) 
= n-^^N{y]y)^p{x\y)\ogp{x\y) 

y x 

-7i-^^N{x,y-X,y)\ogp{x\y) (36) 

xy 

^ ^p{x\y)q{y)logp{x\y) -^q{x,y)logp{x\y) (37) 

xy xy 

= ^{Pix\y)q{yz) - q{xyz))\ogp{x\y) (38) 

xyz 

= ^q{yz)^{p(x\yz) - q{x\yz)) log p{x\yz) (39) 



9(a^|yf) 
Xx|yz) 



qix\yz) log + logp{x\yz) , 



(40) 



where follows from that X -Y - Z. ■ 
If (y,z) e t^[yz]77' following lemma simplifies dSTT i. 
Lemma 4: For any e > 0, there exists 77 > such that if 



(y,z)eC/[ 



[YZW 



then 



J^iliyz) - P{yz))H{Px\Y=y,Z=z) 



yz 



(41) 



where e ^> as 77 0. 
Proof: Since 

^p{x\y) (log p{x\y)f > ^ p{x\y) {log p{x\y)f 

X x:p{x\y)>0.5 

X! Pi^\y) i^ogp{x\y)) , 

x:p{x\y)<0.b 

it is easily shown that H{Px\Y=y) < 0.5 + C from (|3]l. Since 

p{x\yz) — p{x\y) for all (a;, y, z) as X — Y — Z, 



Y,{q{yz)-p{yz))H{Px\Y=y,z=z) 
Y,{q{yz)-p{yz))H{Px \Y=' 



< 



J2 {q{yz)-p{yz)){0.5 + C) 

yz:q{yz)>p{yz) 

E {p{yz)-q{yz))i0.5 + C) 

yz:q{yz)<p{yz) 

{o.5 + c)J2\piyz)-q{yz)\, 



(42) 
(43) 

(44) 

(45) 
(46) 



< (0.5 + C)V2r/ln2, 
where ( |46l ) follows from (y,z) G U^yz]i] ^'^'^ Pinsker's 
inequality. By letting 77 = (0 5+c)'^2in2 ' '■'^^ lemma is proved. 

■ 

Now we use Lemma |4] to simplify dsTl l in the following 
lemma, which uses the conditional Kullback-Leibler diver- 
gence DiQx\Yz\\Px\Yz\QYz) [10]. 

Lemma 5: Assume X — 1^ — Z. If for any e > and any 
given (y,z) e U^z],r ^ drawn - Hi-Pl^^ly*)' *en 

Pr{\DiQx\Yz\\Px\Yz\QYz)+ 

HiQx\Yz)-HiPxiYz)\<e}>l-e (47) 

for 71 sufficiently large and 77 sufficiently small. 

Proof: For any e > 0, there exists a sufficiently small 77 
such that 



< 



^{q{yz) - v{yz))H{Px\Y=y,z=z 

yz 

from Lemma |4] Now, suppose 

X! liyz){D{Qx\Y=y,Z=z \\Px\Y=y,Z=z) 



yz 



HiQx\Y=y,Z=z) - H{Px\Y=y,Z=z)) 



<2- (49) 



Adding (gUl and dUll gives 

\DiQx\Yz\\Px\Yz\QYz) + HiQxiYz) - H{Px\Yz) 



< e. 



(50) 



When n is sufficiently large, the probability that (|49] l is 
satisfied is larger than 1 — § > 1 — e from Lemma [3] Therefore, 
the lemma is proved. ■ 

Before we process to apply the established lemmas, we 
pause to check that conditional entropy similar to entropy 
is lower semicontinuous. Let Pa„s„ — {p^^B,„(a6)} and 
Pab = {pAB{ab)}. Assume H{Pa\b) < 00. 

Lemma 6: If \imm^QcV{PA,„Brr.y Pab) — 0, then 
limm^^HiPA,^\Bj > H{Pa\b)- 

Proof: For any e > 0, there exists sufficient large L and 
M such that 

M 

H{Pa\b) < Y.PBib)H{PA\B=b) + e, (51) 

b=l 

where H{PA\B=b) ^ -J2a=iPA\B{a\b)\ogpA\Bia\b)- On 
the other hand, 

M 

H{Pa„^IbJ > EpS"(^)^(^-4„|b,„=6) (52) 

b=l 
M 

> Y.PB,Ab)H{PA^iB„,=b), (53) 

5=1 

where the right side of (l53T l is a continuous function in 

{pA,„B,„{ab) : 1 < a < L and 1 < & < A/}. If 
limjn^oo V{Pa,„b„,, Pab) = 0, pA,^B^{ab) p{ab) for all 
1 < a < L and 1 < 6 < M. Following (|53] |, by replacing 
Pa„,b„, by PAB and PA,„\B,„=b by PA\B=b on the right side, 
for any e > 0, 

M 

lim H{Pa,jb J > Y.PB{b)H{PA\B=b)~e{5A) 

b=l 

> HiPAiB)~2e, (55) 

where i55[ follows from (fSTl i. Since e > is arbitrary, the 
lemma is proved. ■ 
By Lemma |2] and Lemma |6] we are capable to strengthen 
Lemma |5] and give the following lemma. 

Lemma 7: Assume X — Y — Z. If for any e > and any 
given (y,z) e t^fy^],,' ^ drawn - Y\,p{xi\yi), then 



Pr{^(Qx|yz||^x|yz|Qyz) <e} > 1 - e, 



(56) 



(48) and 



Pr{\H{QxiYz)-HiPxiYz)\<^}>l-^ (57) 

for n sufficiently large and 77 sufficiently small. 

Proof: For any e > and Pxyz, there exists a 
sufficiently small (5 from Lemma |6] such that if 



ViQxYZ,PxYz) <S, 



(58) 



then H{Qx\Yz) - H{Px\yz) > On the other hand, if 
dSOl l is satisfied, then e > H{Qx\yz) - H{Px\yz)- There- 
fore, if both (ISOl l and (ISST i are satisfied, then \H{Qx\y z) ~ 
H{Px\Y z)\ ^ When n is sufficiently large and rj is 
sufficiently small. Lemma |2] shows that 



, D{Qyz\\Pyz) < f as (y,z) e {/['J.^j,,. 



If 



Pr 



XYZ,PxYz) < min 



> 1 



(59) 



Also, Lemma |5] shows that (ISOl l is true with probability larger 
than 1 — f • Therefore, dSTl l can be shown from Lemma [T] 
Similarly, ( |56] l can be verified by LemmaO Lemma |5] together 
with ( |57] i. 

■ 

Due to the following theorem, we just need to bound 
two instead of eight quantities in (|2|i in order to verify that 

(x,y,z) e U('xYZ]r 
Theorem 3: Assume H{Pab) is finite. If 

lim^^oo V{Pa,^b^,Pab) = and lim \H{Pa^b^) — 
H{Pab)\ = 0, then 



lim \H{PaJ-H{Pa)\=0. (60) 

m— >oo 

Proof: 

\im H {Pa J = lim H{Pa^b J - H{Pb^^a,M) 

= H{Pab)- lim H{Pb^\aJ (62) 

< H{Pab) ~ H{Pb\a) (63) 

= H{Pa), (64) 

where ( |63] ) follows from Lemma |6l On the other hand, 
Ivairn^ao H{Pa^) > H{Pa) because entropy is lower semi- 
continuous [11]. Therefore, the theorem is proved. ■ 

Suppose \H{QxYz)-H{PxYz)\ and D{Qxyz\\Pxyz) are 
sufficiently small. In this case, V{QxyZtPxyz) is small 
from Pinsker's inequality and Theorem [3] tells that all the 
nonnegative quantities in (|2]i are also small. 

C. Proof of Theorem [7] 

We first show that for any e > 0, 

Vt{\H{Qxyz 

and 



H{PxYz)\<e] > 1- 2' 



(65) 



Pr {D{Qxyz \\PxYz) <e]>l 



(66) 



when n is sufficiently large and 77 is sufficiently small. 

Let 77 = I so that \H{Qyz) ~ H{Pyz)\ < | as (y,z) e 
C/["^^]^. If \HiQx\Yz) - H{Px\Yz)\ < f, then" 

e > \H{Qx\yz)-H{Px\yz)\ + \H{Qyz)-H{Pyz)\ 
> \H{QxYz)-H{PxYz)\. (67) 

Together with Lemma |2l ( l65] l follows from 

Vy{\H{Qxyz)~H{Pxyz)\ <e} 



Since 77 = f 

D{Qx\yz\\Px\yz\Qyz) < §, then 

e > D{Qx\yz\\Px\yz\Qyz)+D{Qyz\\Pyz) m) 
= D(QxYz\\PxYz)- (70) 
Together with Lemma |7] ( l66] l follows from 

(71) 



x\yz\\Px\yz\Qyz) ^ 2 / - ^ ~ 2" ^^^^ 



Pr{i^(QA-Yz||/'xyz) < e} 
> Pr {l?(Q 

For any 7 > 0, there exists a sufficiently small e < from 
Theorem |3] such that if ( |67] ) and ( iTOl i are satisfied, then all 
the absolute values in ^ are less than ^, and hence, (|2]l is 
satisfied. Therefore, by ( |65] l and ( |66l ), 

Pr{(X,y,z)et/['^^^]^} 

> Pr{{|if(OAyz)-i?(Pxyz)| <e} and 
{I?(Qxyz||Pxyz) < e}} (73) 

> 1 - e (74) 

> 1-7- (75) 



IV. Conclusion 

A version of the Markov lemma which works on both 
finite or countably infinite alphabets has been proved. We have 
also demonstrated a method to ease the verification of jointly 
unified typical sequences. These results can readily generalize 
the achievability parts in some existing coding theorems to 
countably infinite alphabet and they are potentially useful for 
proving coding theorems that apply to both finite and infinite 
alphabets. 
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> Pr{\HiQxiYz)-H{PxiYz)\<^}>l 



(68) 



